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The rapid developing area of compressed sensing suggests that a sparse vector lying in an arbitrary high 
■ dimensional space can be accurately recovered from only a small set of non-adaptive linear measurements. Under 

appropriate conditions on the measurement matrix, the entire information about the original sparse vector is captured 
' in the measurements, and can be recovered using efficient polynomial methods. The vector model has been extended 

both theoretically and practically to a finite set of sparse vectors sharing a common non-zero location set. In this 
paper, we treat a broader framework in which the goal is to recover a possibly infinite set of jointly sparse vectors. 
Extending existing recovery methods to this model is difficult due to the infinite structure of the sparse vector set. 
Instead, we prove that the entire infinite set of sparse vectors can recovered by solving a single, reduced-size finite- 
dimensional problem, corresponding to recovery of a finite set of sparse vectors. We then show that the problem 
can be further reduced to the basic recovery of a single sparse vector by randomly combining the measurement 
' vectors. Our approach results in exact recovery of both countable and uncountable sets as it does not rely on 

^ discretization or heuristic techniques. To efficiently recover the single sparse vector produced by the last reduction 

, step, we suggest an empirical boosting strategy that improves the recovery ability of any given sub-optimal method 

OO . 

. for recovering a sparse vector Numerical experiments on random data demonstrate that when applied to infinite 

sets our strategy outperforms discretization techniques in terms of both run time and empirical recovery rate. In 
' the finite model, our boosting algorithm is characterized by fast run time and superior recovery rate than known 

' popular methods. 

Index Terms 

Basis pursuit, compressed sensing, multiple measurement vectors (MMV), orthogonal matching pursuit (OMP), 
sparse representation. 



I. Introduction 

Many signals of interest often have sparse representations, meaning that the signal is well approximated by 
only a few large coefficients in a specific basis. The traditional strategy to capitalize on the sparsity profile is to 
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first acquire tiie signal in a high-dimensional space, and then utilize a compression method in order to capture 
the dominant part of the signal in the appropriate basis. Familiar formats like MP3 (for audio signals) and JPEG 
(for images) implement this approach. The research area of compressed sensing (CS) challenges this strategy by 
suggesting that a compact representation can be acquired directly. 

The fundamentals of CS were founded in the works of Donoho [1] and Candes et. al. [2]. In the basic model, 
referred to as a single measurement vector (SMV), the signal is a discrete vector x of high dimension. The sensing 
process yields a measurement vector y that is formed by inner products with a set of sensing vectors. The key 
observation is that y can be relatively short and still contain the entire information about x as long as x is sparsely 
represented in some basis, or simply when x itself contains only a few non-zero entries. An important problem in 
this context is whether the vector x producing y is unique [3]. Another well studied issue is the practical recovery 
of X from the compressed data y, which is known to be NP-hard in general. Many sub-optimal methods have been 
proposed for this problem [1],[2],[4],[5], which achieve a high recovery rate when tested on randomly generated 
sparse vectors. 

The SMV model has been extended to a finite set of jointly sparse vectors having their non-zeros occurring 
in a common location set. The sensing vectors are applied to each of the sparse vectors resulting in multiple 
measurement vectors (MMV). This model is well suited for problems in Magnetoencephalography, which is a 
modality for imaging the brain [6], [7], [8]. It is also found in array processing [6], [9], nonparametric spectrum 
analysis of time series [10] and equalization of sparse communication channels [11], [12]. The issue of uniqueness 
in the MMV problem was addressed in [13], [14], together with extensions of SMV recovery techniques to MMV. 

In this paper, we start from a broader model which consists of an infinite set of jointly sparse vectors, termed 
infinite measurement vectors (IMV). The set may be countable or uncountable (for example, when described 
over a continuous interval). The IMV model is broader than MMV and naturally arises in recovery problems 
involving analog signals, such as our earlier work on multi-band signals [15]. As we explain further in the paper, 
the recovery of the entire infinite set of sparse vectors in IMV models is highly complicated. A straightforward 
recovery approach in this context is to consider only a finite subset of vectors using discretization. However, this 
strategy cannot guarantee perfect recovery. Instead, we derive a reduced finite-dimensional problem from which 
the common non-zero location set can be inferred exactly. This paradigm relies on the observation that once the 
non-zero locations are identified, the original recovery problem translates into a simple linear inversion with a 
closed form solution. 

Our first main contribution is a theoretical result showing that for every given IMV problem there is an explicit 
MMV counterpart with the same non-zero location set. This reduction to finite dimensions is achieved without 
any discretization or heuristic techniques and thus allows in principle an exact recovery of the entire set of 
sparse vectors. Other papers that treated problems involving infinite vector sets used discretization techniques to 
approximate the solution [16] or alternatively assumed an underlying discrete finite-dimensional signal model [17]. 
In contrast, our approach is exact as neither the IMV model nor the solution is discretized. Once the IMV problem 
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Fig. 1: The entire flow of the paper consists of: (I) a deterministic reduction from IMV to MMV, (II) a random 
reduction from MMV to SMV and (III) a boosting stage. The ReMBo algorithm is a formal description of the 
last two steps. 



is reduced to an MMV problem, results developed in that context can be applied. 

To further improve the recovery performance both in terms of speed and recovery rate, we develop another 
theoretical result allowing to identify the non-zero locations of a given MMV model from a sparse vector of a 
specific SMV system. As opposed to the IMV reduction, our strategy here is to construct a random SMV problem 
that merges the set of sparse vectors using random coefficients. We prove that this reduction preserves the crucial 
information of the non-zero location set with probability one. 

Our final contribution treats the practical aspect of using a sub-optimal technique to find the sparse vector of an 
SMV problem. While examining popular SMV recovery techniques, we observed that the recovery ability depends 
on the exact non-zero values and not only on their locations. Based on this observation we argue that it is beneficial 
to draw several realizations of the merged measurement vector, by using different random combinations, until a 
sparse vector is identified. These iterations are referred to as the boost step of our method, since, empirically, 
each iteration improves the overall recovery rate of the non-zero location set. We formulate a generic algorithm, 
referred to as ReMBo, for the Reduction of MMV and Boosting. The ReMBo algorithm yields different recovery 
techniques for MMV based on the embedded SMV technique. 

The results presented in this paper provide a complete flow between the recovery problem of different models. 
Fig. 1 depicts the entire flow which can be initiated from a given IMV system or an arbitrary MMV problem. 
Numerical experiments demonstrate the performance advantage of methods derived from the ReMBo algorithm 
over familiar MMV techniques in terms of empirical recovery rate and run time. In addition, we present a simulation 
emphasizing the advantage of the IMV reduction over a discretization technique. 

The outline of the paper is as follows. The IMV model is introduced in Section II, where we also discuss 
conditions for a unique solution. The deterministic reduction method of IMV to MMV and the random reduction 
of MMV to SMV are developed in Sections III and IV respectively. The description of the ReMBo algorithm 
follows in Section V. Numerical experiments are provided in Section VI. 

II. Infinite-Measurement-Vectors Model 
Let A be a given m x n matrix with m < n and consider the parametric linear system: 



y(A) = Ax(A), A G A, 



(1) 
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where A is some known set. Our goal is to recover the unknown vector set x(A) = {x(A)}agA which is referred 
to as the solution set, from the measurements set y(A) = {y(A)}AeA- The cardinality of the parameter set A is 
arbitrary including both finite (single or multiple element) sets and infinite sets (countable or uncountable). For 
example, A can be the index of a discrete set, or alternatively a variable over a continuous interval. 

Clearly, the recovery problem is not well defined unless there is a unique solution set x(A) for each y(A). 
However, the system of (1) does not posses a unique solution in general, since for every A, (1) contains less 
equations than unknowns. Specifically, each y(A) is a vector of length m, while the corresponding x(A) is of 
length n > m. Therefore, in order to guarantee a unique solution an additional prior on x(A) must be associated 
with (1). Throughout this paper, we assume the joint sparsity prior, which constrains each x(A) to have only a 
few non-zero entries and in addition requires that all the vectors in x(A) share a common non-zero location set. 
The system of (1) is termed IMV when A is infinite and the joint sparsity prior is assumed. In the sequel, this 
prior is formally described and is used to derive a sufficient condition for the uniqueness of the IMV solution set. 

A. SMV Model 

We start by describing notation and a uniqueness result for the special case of a single element set A, in which 
(1) is abbreviated as y = Ax. This corresponds to the well studied SMV model. 

A vector x is called K-sparse if it contains no more than K non-zero entries. For a given vector x the support 
function 

/(x) = {A:|xfc/0}, (2) 

describes the locations of the non-zero entries where x^ stands for the A:th entry of x. Thus, a K-sparse vector 
X conforms with a support size |/(x)| < K. A sufficient condition for the uniqueness of a K-sparse solution in 
this setting can be stated in terms of the Kruskal-rank of a matrix, which was first used in the seminal work of 
Kruskal [18]: 

Definition 1: The Kruskal-rank of A, denoted cr(A), is the maximal number q such that every set of q columns 
of A is linearly independent. 

Theorem 1: If the vector x is a i^-sparse solution of y = Ax and ct(A) > 2K, then x is the unique K-sparse 
solution of the system. 

Theorem 1 and its proof are given in [3], [14] with a slightly different notation of Spark(A) instead of the Kruskal- 
rank. 



TABLE I: Sparsity Models and Priors 



5 



Model 


A Cardinality 


Linear System 


K-sparsity prior 


SMV 


1 


y = Ax 


|I(x)|<K 


MMV 


d 


Y = AX 


\m\<K 


IMV 


Infinite 


y(A) = Ax(A) 


\mm\<K 



B. Uniqueness in IMV Models 

The joint sparsity prior becomes relevant when A contains more than a single element. By abuse of notation, 
we define the support function of a vector set as the union over the support of each vector. Specifically, 

/(x(A)) = U /(x(A)) (3) 

AeA 

= {I < k < n\ Xfc(Ao) 7^ 0, for some Aq G A} . 

For brevity, a jointly sparse solution set x(A) with |/(x(A))| < is also called /^-sparse. A A'-sparse vector 
set x(A) implies two properties: (I) Each x(A) is a A'-sparse vector, and (II) the non-zero entries of x(A) are 
confined to a fixed location set of size no more than K. The system of (1) is called MMV in the literature when 
the joint sparsity prior holds over a finite set of sparse vectors [13], [14]. Similarly, we refer to the system of (1) 
as IMV when A is an infinite set and x(A) is jointly sparse. Table I summarizes the models derived from (1) for 
different cardinalities of the set A. The abbreviations used for the linear systems of MMV and IMV are clear from 
the context. Evidently, the joint sparsity prior is what distinguishes MMV and IMV models from being a set of 
independent SMV systems. 

The first property of the joint sparsity prior implies that o-(A) > 2K is sufficient to guarantee the uniqueness 
of a A'-sparse solution set x(A), since we can consider the SMV problem y(A) = Ax(A) for each A separately. 
Exploiting the joint sparsity, we expect that a value of o-(A) less than 2K would suffice to ensure uniqueness. 
Extending uniqueness results regarding MMV [13], [14] leads to the following proposition: 

Proposition 1: If x(A) is a A'-sparse solution set for (1), and 

(j(A) >2K - (dim( span(y(A)) ) - 1) , (4) 
then x(A) is the unique A'-sparse solution set of (1). 

The notation span(y(A)) is used for the subspace of minimal dimension containing the entire vector set y(A). Note 
that span(y(A)) is guaranteed to have finite dimension since y(A) has finite length. For jointly sparse solution sets. 
Proposition 1 indeed claims that the required Kruskal-rank of A can be generally lower than 2 AT of Theorem 1 . 

Proof: The solution set x(A) is A'-sparse which implies that dim(span(x(A))) < A'. It follows from 
the linear system of (1) that the dimension of the subspace span(y(A)) cannot be higher than span(x(A)), i.e. 
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r = dim{span(y(A))} < K. From (4) we get that cr(A) > K. Consequently, for each y(A) = the corresponding 
unique i^-sparse vector is x(A) = 0, as the null space of A cannot contain other i('-sparse vectors. Therefore, 
without loss of generality we can prove the claim for a measurement set y(A) with r > 1 which does not contain 
zero vectors. 

For r > 1 there exists a finite set A = {Ai}[^^ C A such that the vector set y(A) is linearly independent. Since 
A is a finite set, y(A) = Ax(A) is an MMV system. According to [13], [14], the corresponding solution set x(A) 
is unique under the condition (4). Since y(A) does not contain zero vectors, every vector y(A) belongs to some 
subset of r linearly independent vectors. The argument above implies the uniqueness of the corresponding subset 
of x(A), and consequently the uniqueness of the entire solution set. ■ 

Note that (1) can be viewed as a sampling process, where x(A) is the signal, A the sampling operator and 
y(A) is the generated set of samples. In this context, the design of the sampling operator requires to determine 
the number of rows in A such that the samples match a unique signal. However, (4) cannot be used for this task 
since the value of dim( span(y(A)) ) is not known a-priori. In other words, if a matrix A needs to be designed 
such that uniqueness is guaranteed to every A'-sparse solution set, including those with dim{span(y(A))} = 1, 
then the condition (4) is reduced to (t(A) < 2K of Theorem 1. 

In the remainder of the paper, we assume that a unique solution of (1) is guaranteed by either Theorem 1 or 
Proposition 1. In the next sections, we develop the main contributions of our work which address the recovery of 
the unique i^-sparse solution set x(A). 

III. Dimension Reduction for Infinite A 

A. Optimization Viewpoint 

Before discussing the IMV model we review the optimization viewpoint for the SMV and MMV problems. 
If X is the unique i^-sparse solution of the SMV problem y = Ax, then it is also the unique sparsest solution. 
Therefore, recovery of x can be formulated as an optimization problem [1]: 

X = argmin ||x||^j, s.t. y = Ax, (5) 

X 

where the pseudo-norm £o counts the number of non-zero entries in x. In our notation the objective can be replaced 
by |/(x)|. Since (5) is known to be NP-hard [1],[2], several alternatives have been proposed in the literature. Donoho 
[1] and Candes et. al. [2] rigorously analyze the basis pursuit technique which uses the £i norm instead of the £o 
in (5) resulting in a tractable convex program. Various greedy techniques to approximate the sparsest solution have 
also been studied thoroughly [4], [5], [6]. Empirically, all these methods show a high recovery rate of the unique 
sparsest solution when tested on random data. Analogously, it was shown that the combinatorial problem 

X = argmin|/(X)| s.t. Y = AX, (6) 
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recovers the unique K-sparse solution matrix of an MMV system [14]. This optimization problem is also NP-hard 
and can be tackled with similar efficient sub-optimal techniques [13], [14]. 

Extending the optimization viewpoint to the IMV model leads to the equivalent problem: 

x(A) = argmin |/(x(A))| (7) 

x(A) 

s.t. y(A) = Ax(A),VA G A. 

Note that in (7) there are infinitely many unknowns x(A), and infinitely many equations. In contrast to the finite 
formulation of both (5) and (6), a program of the type (7) was not analyzed in the optimization literature. The most 
relevant programming structures are semi-infinite programming [19] and generalized semi-infinite programming 
[20]. However, these formulations allow only for infinite constraints while the optimization variable is finite. This 
inherent intricacy of (7) remains even if the objective is relaxed by known strategies. To overcome this difficulty, 
we suggest to transform (7) into one of the forms known in the literature. Specifically, we show that the joint 
sparsity prior allows to convert (7) into a problem of the form (6), in which both the variable and the constraint 
set are finite. 

A straightforward approach to reduce (7) to a finite-dimensional problem is to choose a finite grid A C A, and 
then solve only for x(A). This yields an MMV system corresponding to the optimization problem (6). In turn, this 
program can be relaxed by any of the known CS techniques. The final step is to approximate x(A) by interpolating 
the partial solution set x(A). However, a discretization approach typically results in an approximation x(A) that 
is different from the unique solution x(A). Moreover, x(A) typically does not satisfy (1) between the grid points, 
that is for A ^ A. This drawback of discretization happens even if a brute-force method is used to optimally 
find the solution set x(A) on the grid A. Furthermore, the density of the grid directly impacts the complexity 
of discretization techniques. For these reasons, we avoid discretization and instead propose an exact method that 
transforms the infinite structure of (7) into a single MMV system without loosing any information. A numerical 
experiment illustrating the difference between our exact method and discretization is provided in Section VI-C. 

B. Paradigm 

In order to solve (7) exactly we split the problem into two sub-problems. One is aimed at finding the support 
set S = /(x(A)). The other reconstructs x(A) from the data y(A) and the knowledge of S. The reason for this 
separation is that once S is known the linear relation of (1) can be inverted exactly. To see this, let A5 denote 
the matrix containing the subset of the columns of A whose indices belong to S. Since the solution set x(A) is 
K-sparse we have that |5| < K. In addition, from Proposition 1, cr(A) > K. Therefore A^ consists of linearly 
independent columns implying that 

{As)^As = I, (8) 
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where (•)^ is the Moore-Penrose pseudo-inverse operation. Explicitly, (As')^ = (Ag As) ^ where A^ denotes 
the conjugate transpose of As- Using S the system of (1) can be written as 

y(A) = A5X^(A), AeA, (9) 

where the superscript x'^(A) is the vector that consists of the entries of x(A) in the locations S. Multiplying (9) 
by (A^)^ from both sides gives 

x^(A) = (A5)V(A), A G A. (10) 
In addition, it follows from the definition of the support set I(x(A)) that 

Xi(A) = 0, Vi^5,AGA. (11) 
Therefore (lO)-(ll) allow for exact recovery of x(A) once the finite set S is correctly recovered. 

C. Method 

The essential part of our method is the first sub-problem that recovers 5 from the measurement set y(A). Our key 
observation is that every collection of vectors spanning the subspace span(y(A)) contains sufficient information 
to recover S exactly, as incorporated in the following theorem: 

Theorem 2: Suppose (1) has a unique ii'-sparse solution set x(A) with S = /(x(A)) and that the matrix A^xn 
satisfies (4). Let V be a matrix of m rows such that the column span of V is equal to span(y(A)). Then, the 
linear system 

V = AU (12) 

has a unique /^-sparse solution U and /(U) = S. 

Proof: Let r = dim(span(y(A))) and construct an in x r matrix Y by taking some set of r linearly 
independent vectors from y(A). Similarly, construct the matrix X of size n x r by taking the corresponding r 
vectors from x(A). The proof is based on observing the linear system 

Y = AX. (13) 

We first prove that X is the unique i^-sparse solution matrix of (13) and that /(X) = S. Based on this result, the 
matrix U is constructed, proving the theorem. 

It is easy to see that /(X) C S, since the columns of X are a subset of x(A). This means that X is a K-sparse 
solution set of (13). Moreover, X is also the unique K-sparse solution of (13) according to Proposition 1. To 
conclude the claim on X it remains to prove that k £ S implies k G /(X) as the opposite direction was already 
proved. If A; G S", then for some Aq G A the kth entry of the vector x(Ao) is non-zero. Now, if x(Ao) is one of 
the columns of X, then the claim follows trivially. Therefore, assume that x(Ao) is not a column of X. We next 
exploit the following lemma: 
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Lemma 1 ([15]): For every two matrices A,P, if |/(P)| < o-(A) then rank(P) = rank(AP). 
Clearly, Lemma 1 ensures that rank(X) = r. In addition it follows from the same lemma that dim{span(x(A))} = 
r. Thus, x(Ao) must be a (non-trivial) linear combination of the columns of X. Since the kth entry of x(Ao) is 
non-zero, it implies that at least one column of X has a non-zero value in its kth entry, which means k € /(X). 

Summarizing the first step of the proof, we have that every r linearly independent columns of y(A) form an 
MMV model (13) having a unique K-sparse solution matrix X, such that /(X) = S. As the column span of 
V is equal to the column span of Y we have that rank(V) = r. Since V and Y have the same rank, and Y 
also has full column rank, we get that V = YR for a unique matrix R of r linearly independent rows. This 
immediately implies that U = XR is a solution matrix for (12). Moreover, U is K-sparse, as each of its columns 
is a linear combination of the columns of X. Proposition 1 implies the uniqueness of U among the iC-sparse 
solution matrices of (12). 

It remains to prove that /(tj) = /(X). To simplify notation, we write X* for the ith row of X. Now, U* = X*R, 
for every 1 < i < n. Thus, if X* is a zero row, then so is U*. However, for a non-zero row X*, the corresponding 
row U* cannot be zero since the rows of R are linearly independent. ■ 

The advantage of Theorem 2 is that it allows to avoid the infinite structure of (7) and to concentrate on finding 
the finite set S by solving the single MMV system of (12). The additional requirement of Theorem 2 is to construct 
a matrix V having a column span equal to span(y(A)) (i.e. the columns of V are a frame for span(y(A))). The 
following proposition suggests a procedure for creating a matrix V with this property. 

Proposition 2: If the integral 

Q = / y(A)y^(A)dA, (14) 

exists, then every matrix V satisfying Q = VV^ has a column span equal to span(y(A)). 

The existence of the integral in (14) translates into a finite energy requirement. Specifically, for countable A the 
integral exists if the sequence {yfc(Ai)}^i is energy bounded in the £2 sense for every 1 < k < m. For uncountable 
A, yfc(A) can be viewed as a function of A which is required to be integrable and of bounded energy in the L2 
sense for every I < k < m. Note that the matrix Q of (14) is positive semi-definite and thus a decomposition 
Q = VV^ always exists. In particular, the columns of V can be chosen as the eigenvectors of Q multiplied by 
the square-root of the corresponding eigenvalues. 

Proof: For finite A the claim follows immediately from the fact that every two matrices M, N with MM^ = 
NN^ have the same column space. Therefore, it remains to extend this property to infinite A. 

Let r = dim(span(y(A))) and define a matrix Y^xr as in the proof of Theorem 2. The columns of Y are 
linearly independent and thus Y^^ is well defined. Define the vector set d(A) = Y^(y(A)),A G A, where each 
d(A) is a vector of length r. By construction, the integral 

/ d(A)d^(A)dA = YtQ(Yt)^ = DD^ (15) 
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Fig. 2: The fundamental stages for the recovery of the non-zero location set S using only one finite-dimensional 
problem. 



exists. The last equality in (15) is due to the positive semi-definiteness of the integrand. Substituting into (14) we 
have that VV^ = (YD)(YD)^ which implies that the column spans of V and (YD) are the same. Since the 
column span of Y equals to span(y(A)), d(A) contains the columns of the identity matrix of size r xr, and thus 
D is invertible. In turn, this implies that span(Y) = span(YD). ■ 

The computation of the matrix Q depends on the underlying application. In [15] we considered this approach 
for the reconstruction of an analog multi-band signal from point-wise samples. This class of signals are sparsely 
represented in the frequency domain as their Fourier transform is restricted to several disjoint intervals. Imposing 
a blind constraint, namely that both sampling and reconstruction are carried out without knowledge of the band 
locations, yields an IMV system that depends on a continuous frequency parameter. As described in [15], in this 
application Q can be computed by evaluating correlations between the sampling sequences in the time domain. 
The existence of the integral in (14) corresponds to the basic requirement that the point-wise sampling process 
produces bounded energy sampling sequences. 

Fig. 2 summarizes the reduction steps that follow from Theorem 2 and Proposition 2. The flow of Fig. 2 was 
first presented and proved in our earlier work [15]. The version we provide here has several improvements over 
the preliminary one of [15]. First, the flow is now divided into two independent logical stages and the purpose of 
each step is highlighted. Second, each stage has a stand-alone proof as opposed to the technique used in [15] to 
prove the entire scheme at once. Mathematically, this separation allows us to remove the restriction imposed in 
[15] on V to have only orthogonal columns. Moreover, each block can be replaced by another set of operations 
having an equivalent functionality. In particular, the computation of the matrix Q of Proposition 2 can be avoided 
if other methods are employed for the construction of a frame V for span(y(A)). 

IV. Dimension Reduction for Finite A 

A. Objective 

We now address the finite case of an MMV system 

Y = AX, (16) 

with A an m X n rectangular matrix as before. Following the convention of Table I, Y is an m x d matrix, and 
the dimensions of X are n x d. We assume that a unique K-sparse solution matrix X with no more than K 
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non-identical zero rows exists. The unique solution X can be found by the optimization problem (6), which has 
known relaxations to tractable techniques. Our goal in this section is to rely on ideas developed in the context 
of the IMV model in order to reduce the dimension of the optimization variable of (6) before performing any 
relaxation. Note that the MMV system (16) is arbitrary and the results developed in the sequel do not assume a 
preceding stage of reduction from IMV. 

Applying the same paradigm of the infinite scenario, we aim to recover the support set S = /(X). This set 
contains the crucial information in the sense that once S is recovered the solution is obtained by (lO)-(ll), namely 
by inverting the relevant columns of A. An immediate corollary of Theorem 2 is that if Y does not have full 
column rank, then (16) can be reduced by taking an appropriate column subset of Y. However, we wish to improve 
on this trivial result. Specifically, we intend to find the support set S from a single SMV optimization of the form 
(5). Such a reduction method is beneficial as the dimensions of the unknown variable in (5) is n while in (6) it 
is nd. 

B. Method 

Our approach is to randomly merge the columns of Y into a single vector y. We then show that the set S can 
be extracted from the random SMV y = Ax. In order to derive this result rigorously we rely on the following 
definition from probability and measure theory [21], [22]: 

Definition 2: A probability distribution V is called absolutely continuous if every event of measure zero occurs 
with probability zero. 

A distribution is absolutely continuous if and only if it can be represented as an integral over an integrable 
density function [21], [22]. For example, Gaussian and uniform distributions have an explicit density function that 
is integrable and thus both are absolutely continuous. Conversely, discrete and other singular distributions are not 
absolutely continuous. The following theorem exploits this property to reduce (6) into (5): 

Theorem 3: Let X be the unique iC-sparse solution matrix of (16) with a{A) > 2K. In addition, let a be a 
random vector of length d with an absolutely continuous distribution and define the random vectors y = Ya and 
X = Xa. Then, for the random SMV system y = Ax we have: 

1) For every realization of a, the vector x is the unique K-sparse solution of the SMV. 

2) /(x) = /(X) with probability one. 

Proof: For every realization of a, the vector x is a linear combination of the jointly if-sparse columns of 
X, and thus x is i^-sparse. It is easy to see that x satisfies the SMV system and that Theorem 1 implies its 
uniqueness among the iC-sparse vectors. 

Denote S = /(X) and observe that the previous argument implies that /(x) C S for every realization of a. 
Therefore, it remains to prove that the event /(x) = S occurs with probability one. Expressing this event in terms 
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of the rows of X gives 

Prob{/(x) = 5} = Prob{a^ A/'(X^) , Vi € 5} (17) 

= 1 - Prob J a e J A/" (X*) I , 
I ies J 

where X* denotes the ith row of X, and TV (X*) = {v | XV = 0} is the null space of that row. Now, for every 
i G S, the row X* is not identically zero, and thus the dimension of J\f (X*) is d — 1. In other words, for every 
i G 5", we have that TV (X*) has a zero measure in the underlying sample space of a, which can be either R'^ or 
C''. The union in (17) is over the finite set S and thus has also a measure zero. The absolutely continuity of the 
distribution of a concludes the proof. ■ 
The randomness of a plays a main role in the reduction method suggested by Theorem 3. In fact, random 
merging is a best choice in the sense that for every deterministic linear merging there are infinite counterexamples 
in which the merging process would fail to preserve the support set S. For example, a simple summation over the 
columns of Y may fail if the non-zero values in a single row of X sum to zero. In contrast. Theorem 3 ensures 
that for every given MMV system and with probability one, the random reduction yields an SMV with the same 
non-zero location set. 

The result of Theorem 3 resembles a result of [23], in which the authors suggested merging the columns of 
Y using an ordinary summation. The non-zero locations were then estimated using a one step greedy algorithm 
(OSGA). It was shown that if the entries of X are random, drawn independently from a Gaussian distribution, 
then the set S can be recovered by OSGA with probability approaching one as long as ^ oo, that is when the 
number of columns in each of the matrices Y,X is taken to infinity. In contrast, our method does not assume a 
stochastic prior on the solution set X. Moreover, Theorem 3 holds with probability one for arbitrary finite and 
fixed values of d. 

V. The ReMBo Algorithm 

Theorem 3 paves the way to a new class of MMV techniques based on reduction to an SMV. In this approach, 
the measurement matrix Y is first transformed into a single vector y by drawing a realization of a from some 
absolutely continuous distribution. Then, an SMV problem of the type (5) is solved in order to find the support 
set S. Finally, the recovery of X is carried out by inverting the matrix As as in (lO)-(ll). 

Since (5) is NP-hard it is not solved explicitly in practice. Instead, many efficient sub-optimal techniques have 
been proposed in the literature that are designed to be tractable but no longer guarantee a recovery of the unique 
sparsest solution. Interestingly, we have discovered that repeating the reduction process of the previous section with 
different realizations of a is advantageous due to the following empirical behavior of these sub-optimal techniques. 
Consider two /^-sparse vectors x, x having the same non-zero locations but with different values. Denote by S an 
SMV technique which is used to recover x, x from the measurement vectors Ax, Ax respectively. Empirically, we 



13 

observed that S may recover one of the vectors x, x while faihng to recover the other, even though their non-zero 
locations are the same. As far as we are aware, this behavior was not studied thoroughly yet in the literature. 
In fact, Monte-Carlo simulations that are typically conducted in the evaluation of CS techniques may imply a 
converse conclusion. For example, Candes et. al. [2] analyzed the basis pursuit method for SMV when A is a 
row subset of the discrete time Fourier matrix. A footnote in the simulation section points out that the observed 
behavior seems to be independent of the exact distribution of which the non-zero entries are drawn from. This 
remark was also validated by other papers that conducted similar experiments. The conjecture that Monte-Carlo 
simulations are insensitive to distribution of the non-zero values appears to be true. Nevertheless, it is beneficial 
for a given SMV system to apply S on both measurement vectors Ax, Ax. Once the crucial information of the 
non-zero locations is recovered, the final step of inverting A5 leads to the correct solution of both x, x. 

The ReMBo algorithm, outlined in Algorithm 1, makes use of the reduction method and also capitalizes on the 
empirical behavior discussed above. In steps 4-7, the MMV system is reduced into an SMV and solved using a 
given SMV technique S. These steps produce a sub-optimal solution x, which is examined in step 8. If x is not 
sparse enough or is not well aligned with the measurements, then the reduction steps are repeated with another 
draw of the random vector a. We term these additional iterations the boosting step of the algorithm. Theorem 3 
ensures that each of the different SMV systems of step 6 has a sparse solution that preserves the required support 
set S with probability one. The iterations improve the chances to recover S by changing the non-zero values of 
the sparse solutions. Note that if the number of iterations exceed the pre-determined parameter Maxlters, then 
the algorithm is terminated. The content of the flag variable indicates whether X represents a valid solution. If 
flag=false, then we may solve the MMV system by any other method. 

In general, CS techniques can be clustered into two groups. Those of the first group search for the sparsest 
feasible solution. The other group contains approximation methods that fix the sparsity to a user-defined value and 
determine a solution in this set that is best aligned with the data. For example, basis pursuit [24] belongs to the 
first group, while matching pursuit [25] with a fixed number of iterations belongs to the second group. The ReMBo 
algorithm can be tuned to prefer either feasibility or sparsity according to user preference by selecting appropriate 
values for the parameters A', e. However, it is recommended to avoid an approximation technique of the second 
group when constraining only K to a desired value. The reason is that such a method makes the condition of step 
8 always true, and thus no boosting will occur. 

We now compare the behavior of ReMBo with standard MMV techniques in terms of computational complexity 
and recovery rate. Clearly, the complexity of SMV is lower due to the reduced number of unknowns. The reduction 
method itself is no more than one matrix multiplication which in practice is a negligible portion of the overall 
run time in typical CS techniques. Performance of different algorithms can also be evaluated by measuring the 
empirical recovery rate in a set of random tests [1], [2], [13], [14]. As we detail in the following section, for some 
parameters choices a single reduction iteration achieves an overall recovery rate that is higher than applying a 
direct MMV technique. For other parameter selections, a single iteration is not sufficient and boosting is required 
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Algorithm 1 ReMBo (Reduce MMV and Boost) 

Input: Y,A 

Control Parameters: K, e, Maxlters, S, V 
Output: X, 5, flag 

1: Set iter= 1 
2: Set flag=false 

3: while (iter < Maxlters) and (flag is false) do 

4: draw a random vector a of length d according to V. 
5: y = Ya 

6: Solve y = Ax using SMV technique S. Denote the solution x. 

7: S = /(x) 

8: if (151 < /sT) and (||y - Ax||2 < e) then 

9: flag=true 
10: else 
11: flag=flase 
12: end if 

13: Construct X using 5 and (10)-( 11) 

14: iter=iter+l 

15: end while 

16: return X, S, flag 



to increase the recovery rate of a ReMBo technique beyond that of a standard MMV. The results indicate that 
ReMBo based techniques are comparably fast even when boosting is employed. 

VI. Numerical experiments 

In this section we begin by evaluating the reduction and boosting approach for MMV systems. The behavior 
of the ReMBo algorithm is demonstrated when the produced SMV is solved using a sub-optimal method. Two 
representative MMV techniques are derived from Algorithm 1 and compared with other popular MMV techniques. 
We then present an experiment that demonstrates the benefits of the IMV reduction flow over a discretization 
technique. 

A. Evaluating ReMBo 

We choose m = 20, n = 30, d = 5 for the dimensions of (16). The following steps are repeated 500 times for 
each MMV technique: 

1) A real- valued matrix A of size 20 x 30 is constructed by drawing each of its entries independently from a 
Gaussian distribution with zero mean and variance one. 

2) For each value of 1 < If < 20 we construct a K-sparse real-valued solution matrix Xi^ of size 30 x 5. The 
non-zero values of X.k are also drawn from a Gaussian distribution in the same way described before. 

3) The MMV technique that is being tested is executed in order to recover each Xj<- from the measurement 
data AXi^. For ReMBo techniques, V is an i.i.d. uniform distribution in [—1, 1]''. 



TABLE II: Sub-Optimal Techniques 
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Model 


Tag 


Formal Description 


Type 




BP 


Basis Pursuit, (5) with objective x i, see [1],[2] 


Convex relaxation 


SMV 


OMP 


Orthogonal Matching Pursuit, see [13] 


Greedy 




FOCUSS 


FOcal Underdetermined System Solver, see [6] 


Greedy 




M-BP-£i [14] 


(6) with objective 7^^^(X) i 


Convex relaxation 




M-BP-£oo [5] 


(6) with objective 7^£^ (X) i 


Convex relaxation 




M-OMP 


MMV version of OMP, see [13] 


Greedy 


MMV 


M-FOCUSS 


MMV version of FOCUSS, see [13] 


Greedy 




ReMBo-BP 


ReMBo with S =BP 


Convex relaxation 




ReMBo-OMP 


ReMBo with S =OMP 


Greedy 



4) A correct solution is announced if X^^ is recovered exactly up to machine precision. 
The empirical recovery rate for each value of K is calculated as the percentage of correct solutions. We also 
collected run time data in order to qualitatively compare between the time complexity of the tested techniques. 
Rigorous complexity analysis and comparison are beyond the scope of this paper. Note that the selection of 
real-valued matrices is not mandatory and the results are also valid for complex values. However, we stick to 
the real-valued setting as it reproduces the setup of [13], [14]. In addition, the same empirical recovery rate is 
noticed when the non-zero entries of X.k are drawn from a non-Gaussian distribution (e.g. uniform distribution). 
This behavior strengthens the conjecture that Monte-Carlo analysis is insensitive to the specific distribution of the 
non-zero values. 

To simplify the presentation of the results. Table II lists the techniques that are used throughout the experiments. 
Short labels are used to denote each of the techniques. The notation TZ^^ (X) stands for a vector of length n such 
that its zth entry is equal to the £p norm of the ith row of X. In the sequel we denote the Max Iters parameter 
of ReMBo based techniques in brackets, for example ReMBo-BP[l]. A default value of Maxlters = rank(Y) is 
used if the brackets are omitted. This selection represents an intuitive choice, since after rank(Y) iterations, step 
5 of Algorithm 1 produces a vector y that is linearly dependent in the realizations of the previous iterations. This 
intuition is discussed later in the results. 

Note that there is a difference in deciding on a correct solution for SMV and MMV. In the latter, a solution is 
considered correct only when all the vectors in the matrix are recovered successfully, while in SMV a recovery 
of a single vector is required. Nevertheless, as both problems amount to recovering the finite support set, we plot 
the recovery rate curves of SMV and MMV techniques on the same scale. An alternative approach would be to 
adjust the SMV recovery curve so that it represents the overall success rate when the SMV technique is applied to 
each of the columns separately. Adjusting the results according to this approach will only intensify the improved 
recovery rate of ReMBo based techniques. 
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Fig. 3: Comparison of MMV techniques based on convex relaxations. The ReMBo techniques are in solid lines. 
As expected, the recovery curves of ReMBo-BP[l] and BP coincide. 



In Fig. 3 we compare between MMV techniques based on convex relaxation of (6). For reference we also draw 
the recovery rate of BP on a single measurement column. It is seen that both M-BP(^i) and M-BP(£oo) suffer 
from a decreased recovery rate with respect to BP. In contrast, the recovery rate of ReMBo-BP improves on BP 
due to the boosting effect. In addition, as revealed from Fig. 7 the average run time of ReMBo-BP is also lower 
than the run time of either M-BP(^i) and M-BP(£oo)- Clearly, this simulation shows that besides the theoretical 
interest in the special convex relaxation of M-BP-^i and M-BP-£oo> in this example these method do not offer 
a practical benefit. Furthermore, the M-BP techniques require the selection of a row norm besides the standard 
selection of £i norm for the final column vector. The reduction method allows to avoid this ambiguous selection 
by first transforming to an SMV problem. 

Matching pursuit (including its variations) and FOCUS S are both greedy methods that construct the set S 
iteratively. These techniques are typically faster than basis pursuit based methods as seen in Fig. 7. In addition, 
extending the SMV version of these techniques into MMV is immediate. As opposed to convex relaxation methods, 
these approaches demonstrate an improved recovery rate when a joint sparsity prior is introduced. This behavior 
is depicted in Fig. 4. A comparison of these methods with ReMBo techniques is shown in Fig. 5. It is seen that 
ReMBo-OMP outperforms M-OMP and M-FOCUSS over the range 1 < K < 13. Specifically, in the intermediate 
range 10 < < 13 it reaches a recovery rate that is approximately 10% higher than the maximal recovery rate 
of the non-ReMBo techniques. In addition, the run time of the ReMBo-OMP is not far from the direct greedy 
approaches as seen from Fig. 7. 

In order to emphasize the impact of iterations. Fig. 6 depicts the recovery rate of ReMBo-BP and ReMBo-OMP 
for different values of Maxlters. The recovery rate at K = 10 is of special interest as according to Theorem 3 
(t(A) > 2K is required' to ensure that the random instances of SMV preserve the set S. For example, a single 

'According to [1],[2], a matrix with random entries has a full column rank and a full Kruskal rank with an overwhelming probability. In 
our setup the maximal value of (j(A) is m = 20. Empirically, it was also noticed that rank(Y) = 5 in all generated measurements. 



B. Results 
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(a) OMP (b) FOCUSS 



Fig. 4: The recovery rate of sequential selection techniques is demonstrated for MMV and SMV with the same 
number of non-zero entries per solution vector. The stopping criteria for OMP is based on the residual. The FOCUSS 
algorithm is designed to produce a i^-sparse approximation of the solution (for this reason a ReMBo-FOCUSS 
method is not tested as it cannot exploit the boosting strategy). 




M-OMP 
M-FOCUSS 
ReMBo-OMP 
ReMBo-BP 
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Fig. 5: A comparison between popular MMV techniques and ReMBo derived methods. 

iteration of ReMBo-BP achieves a recovery rate of 54%, while two and five iterations improve the recovery rate 
to 74% and 91% respectively. A higher number of iterations results in a minor improvement conforming with 
our intuitive default selection of Maxlters = rank(Y). However, the condition of K < 10 is only sufficient and 
empirical recovery is allowed to some extent even for K > 10. This behavior is common to all the techniques 
tested here as shown in Figs. 3-6. In this range of K > 10, repeating the reduction process for more than rank(Y) 
can be beneficial. For example, ReMBo-BP[20] yields a recovery rate of 56% for K = 14 instead of 25% when 
allowing only Maxlters=5. 

C. IMV Reduction vs. Discretization 

We now extend the previous setup in order to simulate an IMV model by letting d = 10000. To discretize the 
IMV system, g evenly spaced columns of Y are chosen resulting in an MMV system whose sparsest solution is 
searched, where 1 < g < 200. Since the non-zero values are drawn randomly, interpolation of the missing columns 




Fig. 6: The impact of boosting iterations for various selections of Maxlters. 




5 10 15 20 

K 



Fig. 7: Average run time of various MMV techniques. 



is no useful in this setting. Instead, we consider an approximation of the non-zero location set S by taking the 
support of the solution matrix on the chosen grid. Finally, the entire solution set is recovered by (lO)-(ll). In 
order to capitalize on the difference between the IMV reduction flow of Fig. 1 and this discretization technique, 
we consider A'-sparse solution matrices Xa' such that each non-zero row of Xa' has only a few non-zero entries 
(e.g. up to 150 non-zero values). For a fair comparison, the M-OMP technique is used for the recovery of X/^ in 
both methods. 

The empirical recovery rate for several values of g is shown in Fig. 8. It is evident that a discretization technique 
of this type requires a grid of g = 200 to approach a reasonable recovery rate, which is still below the recovery 
rate of the IMV flow. In order to explain the superior performance of the IMV flow we plot a typical structure of 
a solution set in Fig. 9. It is clear that discretization may fail as it does not capture the entire information of the 
solution set. In contrast, our approach preserves the necessary information required for perfect reconstruction of 
the solution set, namely the non-zero location set. Furthermore, comparing the average run time of both approaches 
reveals that IMV is even faster than discretization having a similar recovery rate. Note that the density of the grid 
influences the run time of discretization methods. In the example above of g = 200, discretization yields an MMV 



19 



0.6 



0.4 



0.2 









V 






\ 


^ ^ ^g=20p \ 




■\ 

\ 

\ 


< V . . . 

\ \ 
\ \ 

:. , , ,\. ■ .X . 




* \ 

\ 

\ 


g=i00 \ \ 

\ N \ 




\g=40 
s 




IMV flow 

- - - Discretization 



10 
K 



15 




Fig. 8: 



(a) (b) 
A comparison of (a) the recovery rate and (b) the average run time between the IMV flow and discretization. 
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Fig. 9: A typical structure of the solution set and a grid selection. The grid cannot be synchronized with the 
non-zero locations. In this example, discretization technique would fail to reconstruct X2(A) whereas the IMV flow 
guarantees an exact recovery of the support set S. 



system with 200 columns. The IMV flow does not have this drawback, as it follows from Lemma 1 that the matrix 
V can be chosen such that it consists of no more than K < 20 columns. 

VII. Conclusions 

The essence of the reduction theorems developed in this paper is that the recovery of an arbitrary number of 
jointly sparse vectors amounts to solving a single sparse vector of an SMV This result applies to the finite case of 
MMV and to the broader model of IMV which we introduced here. The key observation used in our developments 
is that the non-zero location set is the crucial information for the exact recovery of the entire solution set. We 
prove that this set can be recovered from a low dimensional problem rather than directly from the given high 
dimensional system. 

The explicit recovery problem of sparse vectors is a difficult combinatorial optimization program. Various 
methods to approximate the sparse solution of a given program have been previously proposed. However, to the 
best of our knowledge, a direct simplification of the explicit combinatorial formulation, in the way described here. 
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was not studied so far. Furthermore, in a typical CS setting the sensing process involves randomness while the 
reconstruction is deterministic. The reduction method for MMV shows that randomness can also be beneficial in 
the reconstruction stage. In addition, popular recovery techniques have a fixed performance in terms of run time 
and recovery rate. In contrast, the ReMBo algorithm is tunable as it allows to trade the run time by the overall 
recovery rate. The simulations conducted on several ReMBo methods demonstrate this ability and affirm that these 
methods outperform other known techniques. 
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